- 
                Notifications
    You must be signed in to change notification settings 
- Fork 201
fix: Support both reference.dict and reference.fasta.dict in GRIDSS wrappers #4426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: Support both reference.dict and reference.fasta.dict in GRIDSS wrappers #4426
Conversation
Reference setup has changed to produce reference.fasta.dict instead of reference.dict. Here I have changed it to just check the existance of the given dictionary without asuming a fixed ending like .dict.
| 📝 WalkthroughWalkthroughRenamed Snakemake inputs in the assemble rule (bams → bam, bais → bai). Assemble wrapper updated to use  Changes
 Sequence Diagram(s)sequenceDiagram
    autonumber
    actor Snakemake
    participant AssembleWrapper as assemble/wrapper.py
    participant CallWrapper as call/wrapper.py
    participant PreprocessWrapper as preprocess/wrapper.py
    participant FS as FileSystem
    Note over Snakemake,AssembleWrapper: Assemble rule invokes wrapper with renamed inputs
    Snakemake->>AssembleWrapper: pass inputs (bam, bai, reference, dictionary?)
    AssembleWrapper->>FS: check dictionary path (reference + ".dict" or provided)
    FS-->>AssembleWrapper: exists / missing
    alt dictionary exists
        AssembleWrapper->>FS: read BAM via snakemake.input.bam
        AssembleWrapper->>Snakemake: execute GridSS with {snakemake.input.bam}
    else missing
        AssembleWrapper-->>Snakemake: raise error "Dictionary ... missing"
    end
    Note over CallWrapper,PreprocessWrapper: Both now validate a provided dictionary path
    Snakemake->>CallWrapper: pass reference, dictionary
    CallWrapper->>FS: check provided dictionary
    Snakemake->>PreprocessWrapper: pass reference, dictionary
    PreprocessWrapper->>FS: check provided dictionary (error message updated)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
 🧪 Generate unit tests
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type  Other keywords and placeholders
 CodeRabbit Configuration File ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (1)
bio/gridss/assemble/wrapper.py (1)
34-41: Do not derive the dictionary from the reference; use the provided input dictionary and fix the error messageDeriving the dictionary with reference + ".dict" contradicts the PR objective and breaks cases where the dictionary is reference.dict. Also, the current error message appends “.dict” to {dictionary}, leading to “.dict.dict” in output.
Apply this diff to accept arbitrary provided dictionary paths and improve validation:
-dictionary = reference + ".dict" -if not path.exists(dictionary): - raise ValueError( - "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( - dictionary=dictionary - ) - ) +dictionary = snakemake.input.get("dictionary") +if not dictionary: + raise ValueError("Please set input.dictionary to provide reference dictionary.") +if not path.exists(dictionary): + raise ValueError( + "Dictionary {dictionary} missing. Ensure the reference dictionary exists (e.g., reference.dict or reference.fasta.dict). You can create it with Picard CreateSequenceDictionary.".format( + dictionary=dictionary + ) + )
🧹 Nitpick comments (3)
bio/gridss/preprocess/wrapper.py (1)
34-41: Error message implies a single naming scheme; make it neutral and accept both reference.dict and reference.fasta.dictThe current message suggests only reference.fa.dict. Since we now accept any provided dictionary path, the message should not prescribe a specific naming convention.
Apply this diff to clarify the guidance:
- "Dictionary {dictionary} missing or wrong naming (reference.fa.dict). Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( + "Dictionary {dictionary} missing. Ensure the reference dictionary exists (e.g., reference.dict or reference.fasta.dict). You can create it with Picard CreateSequenceDictionary.".format( dictionary=dictionary )Additionally, consider validating presence of the input key to avoid a TypeError if input.dictionary is omitted:
# Insert after line 19 (after retrieving `dictionary`) if not dictionary: raise ValueError("Please set input.dictionary to provide reference dictionary.")bio/gridss/call/wrapper.py (1)
34-40: Accept arbitrary dictionary paths; update error message accordinglyGood move sourcing the dictionary from Snakemake input. The error message should not append “.dict” or imply a suffix; make it neutral to support both reference.dict and reference.fasta.dict. Also, guard against missing input.dictionary to avoid a TypeError in path.exists(None).
Apply this diff to update the message:
- "{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( + "Dictionary {dictionary} missing. Ensure the reference dictionary exists (e.g., reference.dict or reference.fasta.dict). You can create it with Picard CreateSequenceDictionary.".format( dictionary=dictionary )And add a presence check right after retrieving the dictionary (outside this hunk):
# Insert after line 19 (after retrieving `dictionary`) if not dictionary: raise ValueError("Please set input.dictionary to provide reference dictionary.")bio/gridss/assemble/wrapper.py (1)
48-48: Typo: “consistency”Minor spelling fix in the comment.
- "{snakemake.input.bam} " # changed to bam for consistancy + "{snakemake.input.bam} " # changed to bam for consistency
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
- bio/gridss/assemble/test/Snakefile(1 hunks)
- bio/gridss/assemble/wrapper.py(2 hunks)
- bio/gridss/call/wrapper.py(1 hunks)
- bio/gridss/preprocess/wrapper.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit Configuration File
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
- bio/gridss/call/wrapper.py
- bio/gridss/assemble/wrapper.py
- bio/gridss/preprocess/wrapper.py
**/wrapper.py
⚙️ CodeRabbit Configuration File
Do not complain about use of undefined variable called
snakemake.
Files:
- bio/gridss/call/wrapper.py
- bio/gridss/assemble/wrapper.py
- bio/gridss/preprocess/wrapper.py
🧠 Learnings (2)
📚 Learning: 2025-06-04T06:32:20.090Z
Learnt from: rohan-ibn-tariq
PR: snakemake/snakemake-wrappers#4160
File: bio/trf/wrapper.py:18-26
Timestamp: 2025-06-04T06:32:20.090Z
Learning: For Snakemake wrappers, it's preferable to keep parameter dictionaries and constants directly visible in wrapper.py files rather than importing from config modules, to maintain minimal, self-documenting code that doc viewers can understand at a glance.
Applied to files:
- bio/gridss/call/wrapper.py
📚 Learning: 2024-11-26T14:59:03.678Z
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Applied to files:
- bio/gridss/assemble/wrapper.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (1)
bio/gridss/assemble/test/Snakefile (1)
36-37: LGTM: input keys renamed to singular while still providing listsUsing singular keys (bam/bai) while supplying lists via expand is valid in Snakemake and aligns with the updated wrapper usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
bio/gridss/assemble/wrapper.py (1)
37-37: Good: clearer error message with f-stringThe updated error text is clear and actionable, and the f-string improves readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
- bio/gridss/assemble/wrapper.py(2 hunks)
- bio/gridss/call/wrapper.py(1 hunks)
- bio/gridss/preprocess/wrapper.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- bio/gridss/call/wrapper.py
- bio/gridss/preprocess/wrapper.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit Configuration File
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
- bio/gridss/assemble/wrapper.py
**/wrapper.py
⚙️ CodeRabbit Configuration File
Do not complain about use of undefined variable called
snakemake.
Files:
- bio/gridss/assemble/wrapper.py
🧠 Learnings (3)
📚 Learning: 2024-11-26T14:59:03.678Z
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Applied to files:
- bio/gridss/assemble/wrapper.py
📚 Learning: 2024-11-26T08:31:00.099Z
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
Applied to files:
- bio/gridss/assemble/wrapper.py
📚 Learning: 2024-11-26T09:16:39.570Z
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.
Applied to files:
- bio/gridss/assemble/wrapper.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (1)
bio/gridss/assemble/wrapper.py (1)
46-46: Fix leftover plural input references in call wrapperThe search uncovered a lingering reference to
snakemake.input.bamsin the call wrapper, which needs to be aligned with the singularsnakemake.input.bamchange:• bio/gridss/call/wrapper.py:48
Replace"{snakemake.input.bams} "with
"{snakemake.input.bam} "After updating, please rerun the pipeline/tests to ensure there are no more
bamsorbaisusages.⛔ Skipped due to learnings
Learnt from: johanneskoester PR: snakemake/snakemake-wrappers#3498 File: bio/ngscheckmate/ncm/wrapper.py:68-68 Timestamp: 2024-12-06T14:25:43.922Z Learning: In the `bio/ngscheckmate/ncm/wrapper.py` file for the NGSCheckMate wrapper, do not suggest adding file existence checks for the list file input, as the files are already required by the wrapper.Learnt from: tdayris PR: snakemake/snakemake-wrappers#3502 File: bio/ngsbits/sampleancestry/wrapper.py:18-23 Timestamp: 2024-11-26T14:59:03.678Z Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
| dictionary = reference + ".dict" | ||
| if not path.exists(dictionary): | ||
| raise ValueError( | ||
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | ||
| dictionary=dictionary | ||
| ) | ||
| f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Support both reference.fa.dict and reference.dict (and an explicit input) to avoid regressions
The current check only accepts reference + ".dict". That breaks setups that still produce reference.dict, which this PR aims to support. Recommend trying both naming conventions and, if present, honoring an explicit snakemake.input.dictionary (aligning with call/preprocess wrappers).
Apply this diff to make the check robust and backward-compatible:
-dictionary = reference + ".dict"
-if not path.exists(dictionary):
-    raise ValueError(
-        f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."
-    )
+dictionary = snakemake.input.get("dictionary")
+if dictionary:
+    if not path.exists(dictionary):
+        raise ValueError(
+            f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard."
+        )
+else:
+    candidates = [f"{reference}.dict", f"{path.splitext(reference)[0]}.dict"]
+    if not any(path.exists(d) for d in candidates):
+        raise ValueError(
+            "Reference dictionary not found. Looked for: {}. Please create it (e.g., with Picard CreateSequenceDictionary).".format(
+                ", ".join(candidates)
+            )
+        )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dictionary = reference + ".dict" | |
| if not path.exists(dictionary): | |
| raise ValueError( | |
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | |
| dictionary=dictionary | |
| ) | |
| f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | |
| ) | |
| # before this block, reference is defined above | |
| # replace the original dictionary existence check with: | |
| dictionary = snakemake.input.get("dictionary") | |
| if dictionary: | |
| if not path.exists(dictionary): | |
| raise ValueError( | |
| f"{dictionary} missing. Please make sure the reference dictionary was properly created. " | |
| "This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | |
| ) | |
| else: | |
| candidates = [f"{reference}.dict", f"{path.splitext(reference)[0]}.dict"] | |
| if not any(path.exists(d) for d in candidates): | |
| raise ValueError( | |
| "Reference dictionary not found. Looked for: {}. Please create it (e.g., with Picard CreateSequenceDictionary).".format( | |
| ", ".join(candidates) | |
| ) | |
| ) | 
🤖 Prompt for AI Agents
In bio/gridss/assemble/wrapper.py around lines 34 to 38, the current check only
looks for reference + ".dict" which breaks setups that use reference.dict or
provide an explicit snakemake.input.dictionary; update the logic to first prefer
and honor an explicit snakemake.input.dictionary if present, then check for both
reference + ".fa.dict" and reference + ".dict" (or other common variants) and
use the first one found, raising the missing-file ValueError only if none of
these exist; ensure the chosen dictionary path is assigned to the same variable
used downstream.
| # dictionary = path.splitext(reference)[0] + ".dict" --> leads to incorrect paths (reference.dict), which is incompatible with setup reference (reference.fa.dict) | ||
| if not path.exists(dictionary): | ||
| raise ValueError( | ||
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | ||
| dictionary=dictionary | ||
| ) | ||
| f"{dictionary} missing or wrong naming (reference.fa.dict). Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # dictionary = path.splitext(reference)[0] + ".dict" --> leads to incorrect paths (reference.dict), which is incompatible with setup reference (reference.fa.dict) | |
| if not path.exists(dictionary): | |
| raise ValueError( | |
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | |
| dictionary=dictionary | |
| ) | |
| f"{dictionary} missing or wrong naming (reference.fa.dict). Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | 
| dictionary = reference + ".dict" | ||
| if not path.exists(dictionary): | ||
| raise ValueError( | ||
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | ||
| dictionary=dictionary | ||
| ) | ||
| f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dictionary = reference + ".dict" | |
| if not path.exists(dictionary): | |
| raise ValueError( | |
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | |
| dictionary=dictionary | |
| ) | |
| f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | 
| #dictionary = path.splitext(reference)[0] + ".dict" | ||
| if not path.exists(dictionary): | ||
| raise ValueError( | ||
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | ||
| dictionary=dictionary | ||
| ) | ||
| f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #dictionary = path.splitext(reference)[0] + ".dict" | |
| if not path.exists(dictionary): | |
| raise ValueError( | |
| "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format( | |
| dictionary=dictionary | |
| ) | |
| f"{dictionary} missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard." | 
| reference="reference/genome.fasta", | ||
| dictionary="reference/genome.dict", | ||
| indices=multiext("reference/genome.fasta", *reference_index_endings), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| reference="reference/genome.fasta", | |
| dictionary="reference/genome.dict", | |
| indices=multiext("reference/genome.fasta", *reference_index_endings), | |
| ref="reference/genome.fasta", | |
| dict="reference/genome.dict", | |
| idx=multiext("reference/genome.fasta", *reference_index_endings), | 
| assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings) | ||
| params: | ||
| extra="--jvmheap 1g", | ||
| workingdir=WORKING_DIR | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a temporary directory?
Reference setup has changed to produce reference.fasta.dict instead of reference.dict.
Here I have changed it to just check the existance of the given dictionary without asuming a fixed ending like .dict.
Summary by CodeRabbit
Refactor
Bug Fixes
Documentation